Method for recognizing audio-visual data in transmission networks, in particular internet

ABSTRACT

The present invention regards a method for recognizing an audio and/or visual format in a digital transmission network, in particular the Internet, wherein formats consist of a quasi-continuous or divided in packets sequence of data, and all or parts of said sequence of data are analyzed for the presence of one or more bit patterns and a notice is given in the case that from an analyzed bit pattern a determined format can be recognized.

[0001] The invention regards a method for recognizing audio-visual data in transmission networks, in particular Internet. For watching over the net it is not necessary to verify the contents of determined data, but only their format. In Internet today there are presented, transmitted and/or downloaded data files. Data are also proposed as so called Live-Streams, i.e. transmitted in a quasi-continuous way—with live or on demand audio-visual contents. Various determined formats are used from Microsoft, Real Networks, MPEG and others. Due to the better access for the Internet surfers and to the very great offer of music, the presence of musical data files in the mp3 format (MPEG 1, 2 and 2.5 Layer 3) in particular has increased in an exponential way. This method is known from the international standards ISO/IEC 11172-3 and 11318-3. The copyrights of the authors and the publishers, as well as other possible rights, of the said music files, are usually not taken in consideration by the Internet surfers during the downloads and very often by the exhibitors too. We can expect other coding methods to be used in different networks, like e.g. MPEG 4 AAC or MPEG 4 Twin VQ/AAC, mp3pro, AACPlus or MPEG 4 Video and also proprietor not standardized methods of Companies which have gained a place on the market, the so called Industry Standards.

[0002] Usually mp3-files are available on web-servers and can be reached by browsers like e.g. Netscape Navigator, Microsoft Internet Explorer, or others, using an URL (Unified Resource Location). If the Internet surfer could store an Audio or Video file on his or her own computer, he or she can reach it usually by clicking, i.e. controlling the web contents by means of the mouse, on the corresponding name, which indicates a piece of music, a film, a Radio or TV program. The file is transferred from the Internet to the user through a TCP/IP or FTP/IP or UDP/IP protocol, or possibly with RTP or RTSP. Due to the separation of the application from the IP protocol, it is not possible to know the format used in the transmission. This is valid for the last router of the Internet surfer too.

[0003] There is no control possibility for analyzing not allowable audio-visual files or formats and stopping the Internet access or transmission. Consequently the proprietors of rights e.g. for texts, music, movies, productions, lose all or part of their incomes. Padlocks are provided, which should prevent that, with the effect that corresponding unlocking algorithms are developed and the payment for the audio-visual productions are again circumvented.

[0004] It is an aim of the invention of avoiding the above said disadvantages.

[0005] This aim is reached by means of a method having the characteristics according to claim 1, while advantageous implementations are described in the subclaims.

[0006] The invention will now be described in more detail by means of implementation examples.

[0007] Here are the characteristics of a first implementation example are shown in Table 1:

[0008] By means of a searching algorithm, preset for recognizing the bit patterns “11” and “0101”, the beginnings of a 4th and 8th frame are searched for. The knowledge of the distances between two “11” of a 4th block and two “0101” of a 8th block are used as a check for avoiding a mistaken interpretation of the found data pattern.

[0009] Data signals can be transmitted with or without error protection. In the last case, another parameter, namely the number of the checks, can be used for deciding whether or not the identification is correct. Furthermore it has to be considered that at least 3 out of 4 frames must show the corresponding data pattern.

[0010] Correct Interpretation

[0011] Bits 1 and 2, as well as 13 and 14, as well as 37 and 38 are identified as “11” data pattern. Because only a position, i.e. 25 and 26 do not show “11” but “10”, this fact will be considered a transmission error by the algorithm.

[0012] False Interpretation

[0013] Bits 2 and 3, as well as 14 and 15 are identified as “11” data pattern. Later, during the check of bits 26 and 27 as well as 42 and 43, which show one time “01” and one time “10”, the algorithm concludes that they are not the correct beginnings.

[0014] The special characteristics of the second implementation example are:

[0015] Transmission Medium: Internet

[0016] Used data: MPEG 1/2 Layer 3, also called mp3, coded with 128 Kbit/s

[0017] Transmission protocol TCP/IP

[0018] Sender name (-address) XAV (123.456.789.12)

[0019] Receiver name(-address) MAK (987.654.321.98)

[0020] The construction of a mp3 data stream, the so called Audio Frame, appears, according to ISO/IEC 11172-3 and 11318-3, in the way shown in FIG. 1:

[0021] The composition of the four segments of data, or fields, like Header, Error Check, Audio Data and Ancillary Data is as follows.:

[0022] Header, consisting of Syncword, ID, layer, protection_bit, bitrate_index, sampling_frequency, padding_bit, mode, mode_extension, copyright, original/copy, emphasis

[0023] Error check, consisting of Crc_check

[0024] Audio data, consisting of Bit_allocation, scalefactors, samples

[0025] Ancillary data, consisting of Free Data, to be defined by the user.

[0026] In this implementation example in particular the first variables of the header are used for the analysis, like the fields Syncword, which has always for definition the contents ‘1111 1111 1111’, ID, which in the MPEG use, as in the example has the contents ‘1’ and Layer, which in Layer 3 use, as in the example has the contents ‘01’ and Bitrate_index, according to the following table, which indicates the used Bitrate and also the length of the frame

[0027] “1111” is not allowed due to the collision with the Syncword identification which has the contents ‘1111 1111 1111’.

[0028] Furthermore other variables, like for instance Bit_allocation or Scalefactors, can be used for the analysis.

[0029] With a 128 Kbit/s mp3 encoding and a frame duration of 24 ms, the mp3 Audio Frame has an average length of 128000 Bit/s * 0,024 s=3072 bits=384 Bytes.

[0030] The Internet transmission protocols are directed to the OSI-reference pattern represented in FIG. 2. The application Data—here mp3—are located in the Layer 1 layer. This cannot be exchanged with Layer 1 of the ISO/IEC MPEG Standard.

[0031] The known protocols used for Internet are located in Layer 4 and Layer 3, here also TCP and IP. A typical size of the in Internet transmitted packets is 1,5 Kbyte, which means that the 384 Byte size of the Audio Frame used in the example, can be contained many times inside a TCP/IP packet, as represented in FIG. 3:

[0032] From FIG. 3 it can be seen that the marking variables contained in an Audio Frame, like Syncword, ID and layer can appear many times in each TCP packet.

[0033] The TCP packets transmitted in Internet are identified by a Header too, which by the way shows the so called port numbers, which identify the transmitting and receiving applications, and a sequence number, which gives the position of each segment in the Data stream.

[0034] The IP packets transmitted in Internet, here TCP/IP, can, if necessary, also be fragmented, for instance made smaller, see FIG. 4

[0035] With the fragmentation we can obtain TCP/IP packets so small that a mp3 Audio Frame cannot more completely fit inside the TCP/IP packet, and therefore there is also the probability that even variables like syncword, ID and layer cannot fit in one packet.

[0036] At last, many different packets pertaining to several applications can be found at the selection server for an Internet surfer, for instance as represented in Tab. 3:

[0037] The corresponding contents are unknown to the router, which is at the selection server. He is only sure that the Internet surfer connected to him receives his addressed IP packets, independently from the relevant source, application size, etc. The router also does not arrange the packets according the correct sequence—this task is performed by the receiving application of the Internet surfer.

[0038] The method according to the invention analyzes the IP packets, as regards their bit contents, before delivering the file to the Internet surfer, documenting sender and receiver, performing a verification of the legality, and blocking the delivery, if this is necessary.

[0039] The mp3 files or mp3-live-streams are characterized in that, in Layer 1, the bit transmission layer, the sequence of bits, consisting of sync word, IP and layer, leads to the pattern represented in FIG. 5:

[0040] The MP3 Data stream is divided in packets and provided with an IP-protocol, e.g. TCP, UDP, or FTP. The packets have usually a size of 1500 bytes, i.e. 12000 bits. With the said frame size, we can assume that in an IP-packet, at least one header is contained.

[0041] The method according to the invention stores one or more packets, according to the memory size, into the memory. The protocol used for the transmission analyzes and determines the addresses of the sender and the addressee, and memorizes them separately. The data pertaining to the application are then insulated from the bit transmission layer (Layer 1) and, if this is necessary, ordered according to the correct sequence. The bit pattern shown in FIG. 5 is then searched for in said insulated sequence of data for the application. If the analysis is positive, then the analysis is repeated a few times in order to assure that the format is indeed the searched one.

[0042] For verifying the legality of the transmission, several criteria can be applied. From the one side there can be available one more or less regular comparison table, in order to compare the address of legal senders of audio-visual contents with the sender address of the actual IP-packet. If the address is that of a legal sender, then it is not necessary to perform a further control of the packets. Further possibilities consist in the additional provision of utilizing in the analysis method according to the invention, locking mechanisms, like watermarking, etc., available in the application, with the knowledge of the use.

[0043] In order to do a precise declaration about the quantity of the data called by the Internet surfer, the analysis is performed in a quasi-continuous way for each IP packet. The results of the analysis is kept in a table, which may e.g. have the following aspect shown in FIG. 6.

[0044] In the case that the following data of the application data have to be analyzed, e.g. in the in mp3 often used ID-tab, then also they have to be optionally kept: the contents of the mp3 files, the composer, the executor, etc. Such data can be prepared for and used by the relevant Organizations (in Germany the GEMA).

[0045] In the case of a mobile Internet surfer (e.g. with GMS, UMTS, GPRS), it is necessary to guarantee that the analysis has been made at the relevant server, which may change during the movement from one to another transmission cell of the network, if the server is note always the same. To this purpose, the method according to the invention may advantageously be used, wherein at each server which is activated, a piece of information is prepared, which indicates the actually analyzed receiving address.

[0046] For optimizing the calculation capacity for the method according to the invention, the following variant may be used: in the case that the positive analysis has to be performed very often, i.e. in each IP-packet of the application, then it is possible to

[0047] a) perform the analysis only for each nth packet, or

[0048] b) evaluate only once the distance between two syncwords (1111 1111 1111) from the bitrate and verify the syncword only at said distances.

[0049] The variant b) can be used also for a better identification, as a causal connection exists between the bitrate in the bitrate index, the length of the transmitted frame, and the distance between two syncwords “111111111111”. In the example (bitrate_index-“1001”, corresponds to 128 kBit/s) we clearly obtain a distance between two syncwords “111111111111” of 384 Bytes=3072 bits.

[0050] The figure shows the construction for realizing the method according to the invention. Internet users of very different types can for instance connect to an Internet Service Provider, like t-online or AOL, through PSTN (Public Switched Telephone Network) (modem) or ISDN, in order to surf in Internet, calling information or sending e-mails. The flow of information between the different Internet attendants, hence also the Service providers and the Internet users, between one Internet site and another as well as between Internet sites and Internet users, is developed through a so called server, a router. The method according to the invention comes therefore in the form of a

[0051] a) plug-in card for the router, or

[0052] b) an add-on software running on a PC, which analyzes the inserted data.

[0053] During the analysis the data are not decoded, hence e.g. made audible, if they are music data, but the format of the data is analyzed by comparison with a determined pattern, in order to verify a determined format.

[0054] As explained, for the bit pattern analysis a decoding of the data stream is not foreseen, but bit pattern or bit streams are examined quasi “from outside”, and, if a corresponding bit pattern is analyzed, it is possible to infer also the format, or the coding, without having to decode or actually hear e.g. the music contents of the data stream. 

1. Method for recognizing and audio and/or visual format in a digital transmission network, namely in Internet, wherein formats consist of a quasi continuous or divided in packets sequence of data, and all or part of said sequence of data are analyzed for the presence of one or more bit patterns and a notice is given in the case that from an analyzed bit pattern a determined format can be recognized and that the recognizing of a determined bit pattern sequence of data is performed by means of analysis of a synchronizing word (sync-word).
 2. Method for recognizing a format according to claim 1, characterized in that determined bit patterns correspond to determined uses, for instance MP3, MPEG 4 Video.
 3. Method for recognizing a format according to claim 1, characterized in that the address of the sender of the analyzed data stream and the analyzed sequence of data are memorized, if the sequence of data shows a determined data format.
 4. Method according to any one of the preceding claims, characterized in that the memorized sender addresses are stored in a first list and said list is compared with a second list, wherein there are stored the addresses of allowed (legal) addresses.
 5. Method according to any one of the preceding claims, characterized in that the transmission of the sequence of data is interrupted, if it has been established that the address of the sender is not contained in the list of the allowed addresses.
 6. Method according to any one of the preceding claims, characterized in that the address of the receiver of the sequence of data (data stream) is memorized and preferably the duration of the transmission is calculated and memorized.
 7. Method according to any one of the preceding claims, characterized in that the format to be recognized is MP3, mp3 pro, AAC, AAC Plus, MPEG 1 or 2, Layer 2 or Layer 3 or MPEG 2,5 Layer
 3. 8. Method according to any one of the preceding claims, characterized in that the recognizing of a determined bit pattern is additionally performed through a plausibility verification on the basis of the distance between sync-words, which is calculated on the basis of the available frame lengths of a sequence of data.
 9. Method according to any one of the preceding claims, characterized in that the transmitted contents and the proprietor/composer/author/writer is memorized.
 10. Method according to any one of the preceding claims, characterized in that all the found data are table organized and memorized.
 11. Method according to any one of the preceding claims, characterized in that a virtual Internet user (virtual receiver) is simulated, who performs the complete decoding and memorizing of the audio-visual data.
 12. Method according to any one of the preceding claims, characterized in that known method are additionally used for locking the information before unlocking.
 13. Method according to any one of the preceding claims, characterized in that the analysis takes place at the Internet server.
 14. Method according to claim 14, characterized in that the analysis is made at the Internet server by means of a personal computer.
 15. Method according to any one of the preceding claims, characterized in that the method is realized by means of a plug-in card for the Internet-router.
 16. Method according to any one of the preceding claims, characterized in that at the router of different Internet servers the prepared lists are continuously compared one to the other and/,or updated or exchanged.
 17. Method according to any one of the preceding claims, characterized in that for analyzing the bit patterns neither decoding is performed nor a decoder is employed.
 18. Method according to claim 1, characterized in that the recognizing of a determined bit pattern in the sequence of data is performed through the analysis of Header-information, e.g. ID, Layer, Protection-bit, Bitrate-index, etc.
 19. Method for recognizing and audio and/or visual format in a digital transmission network, namely in Internet, wherein formats consist of a quasi continuous or divided in packets sequence of data, and all or part of said sequence of data are analyzed for the presence of one or more bit patterns and a notice is given in the case that from an analyzed bit pattern a determined format can be recognized wherein all or parts of said sequence of data are insulated from the bit transmission layer. (Layer 1) and are further analyzed.
 20. Apparatus for recognizing an audio and/or visual format in a digital transmission network, namely the Internet, implementing the method as defined in one or more of the preceding claims.
 21. System for recognizing an audio and/or visual format in a digital transmission network, namely the Internet, implementing the method as defined in one or more of the preceding claims. 